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@ A generalized analysis-by-synthesis technique is 
disclosed. Illustratively, a section of an original signal 
containing a local maximum energy is identified. A 
plurality of segments of the original signal containing 
the local maximum energy are selected based on a 
plurality of time shifts. These segments are termed 
"trial original signals.** Each trial original signal is 
compared to a synthesized signal from an adaptive 
codebook and a measure of similarity (6.9., a cross- 
correlation) between these signals is evaluated. A 
trial original signal for use in coding Is determined 
based on one or more evaluated measures of simi- 
larity. A signal reflecting a coded representation of 
the original signal Is generated based on one or 
more determined trial original signals. The signal 
reflecting a coded representation of the original sig- 
nal may be provided by an analysis-by-synthesiscod- 
er. such as a CELP coder. 
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® A generalized analysis-by-synthesis technique is disclosed. Illustratively, a section of an original signal 
containing a local maximum energy is identified. A plurality of segments of the original signal containing the 
local maximum energy are selected based on a plurality of time shifts. These segments are termed "trial original 
signals." Each trial original signal is compared to a synthesized signal from an adaptive codebook and a 
measure of similarity (e.g., a cross-correlation) between these signals is evaluated. A trial original signal for use 
in coding is determined based on one or more evaluated measures of similarity. A signal reflecting a coded 
representation of the original signal is generated based on one or more determined trial original signals. The 
signal reflecting a coded representation of the original signal may be provided by an analysis-by-synthesiscoder, 
such as a CELP coder. 
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Field of the Invention 

The present invention relates generally to speech coding systems and more specifically to a reduction 
of bandwidth requirements in analysls-by-synthesis speech coding systems. 

5 

Background of the Invention 

Speech coding systems function to provide codeword representations of speech signals for commu- 
nication over a channel or network to one or more system receivers. Each system receiver reconstructs 
10 speech signals from received codewords. The amount of codeword information communicated by a system 
in a given time period defines system bandwidth and affects the quality of speech reproduced by system 
receivers. 

Designers of speech coding systems often seek to provide high quality speech reproduction capability 
using as little bandwidth as possible. However, requirements for high quality speech and low bandwidth 

75 may conflict and therefore present engineering trade-offs in a design process. This notwithstanding, speech 
coding techniques have been developed which provide acceptable speech quality at reduced channel 
bandwidths. Among these are analysis- by- synthesis speech coding techniques. 

With analysis-by-synthesis speech coding techniques, speech signals are coded through a waveform 
matching procedure. A candidate speech signal Is synthesized from one or more parameters for comparl- 

20 son to an original speech signal to be encoded. By varying parameters, different synthesized candidate 
speech signals may be determined. The parameters of the closest matching candidate speech signal may 
then be used to represent the original speech signal. 

Many analysis-by-synthesis coders, e.g., most code-excited linear prediction (CELP) coders, employ a 
long-term predictor (LTP) to model long-term correlations in speech signals. (The term "speech signals" 

26 means actual speech or any of the residual and excitation signals present in analysis-by-synthesis coders.) 
During the synthesis process, an LTP is conventionally realized either as an alt-pole filter or as an adaptive 
codebook with gain scaling. As a general matter, long-term correlations in speech signals allow a past 
reconstructed speech signal to serve as an approximation of a current speech signal. LTPs work to 
compare several past speech signals (which have already been coded) to a current (original) speech signal. 

30 By such comparisons, the LTP determines which past signal most closely matches the original signal. A 
past speech signal is identifiable by a delay which indicates how far in the past (from current time) the 
signal is found. A coder employing an LTP subtracts a scaled version of the closest matching past speech 
signal (i.e., the best approximation) from the current speech signal to yield a signal with reduced long-term 
correlation. This signal is then coded, typically with a fixed stochastic codebook (FSCB). The FSCB index 

35 and LTP delay, among other parameters, are transmitted to a CELP decoder which can recover an estimate 
of the original speech from these parameters. 

By modeling long-term correlations of speech, the quality of reconstructed speech at a decoder may be 
enhanced. This enhancement, however, is not achieved without a significant increase in bandwidth. For 
example, in order to model long-term correlations in speech, conventional CELP coders may transmit 8-bit 

40 delay information every 5 or 7.5 ms (referred to as a subframe). Such time-varying delay parameters 
require, e.g., between one and two additional kilobits (kb) per second of bandwidth. Because variations in 
LTP delay may not be predictable over time (/.e., a sequence of LTP delay values may be stochastic In 
nature), it may prove difficult to reduce the additional bandwidth requirement through improved coding of 
delay parameters. 

45 One approach to reducing the extra bandwidth requirements of analysis-by-synthesis coders employing 
an LTP might be to transmit LTP delay values less often and determine intermediate LTP delay values by 
Interpolation. However, interpolation may lead to suboptlmal delay values being used by the LTP in 
individual subframes of the speech signal. For example. If the delay Is suboptlmal, then the LTP will map 
past speech signals into the present in a suboptlmal fashion. As a result, the difference between past 

60 speech mapped into the present and the original signal will be larger than it might otherwise be. The FSCB 
must then work to undo the effects of this sutwptimal time-shift rather than perform its normal function of 
refining waveform shape. As a result, significant audible distortion may result. 

Summary of the Invention 

55 

The present Invention provides a method and apparatus for reducing bandwidth requirements in 
analysis-by-synthesis coding systems. In accordance with the present Invention, generalized analysis- 
by- synthesis coding is provided through variation of original signals. Original signal variants are referred 
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to as tn'ai original signals. Use of trial original signals in place of or as a supplement to the use of original 
signals in analysis-by-synthesis coding reduces coding error and bit rate requirements. In the context of 
speech coding, reduced coding error affords less frequent transmission of LTP delay information and allows 
for delay interpolation with little or no degradation in the quality of reconstructed speech. The invention is 

6 applicable to, among other things, networks for communicating speech information, such as, for example, 
wireless (e.g.. cellular) and conventional telephone networks. 

Regarding speech coding, trial original signals are illustratively signals which are perceptually (e.g., 
audibly) similar to the actual original signal. The degree of audible similarity between a trial original signal 
and the actual original signal may affect coded bit. rate and the quality of speech synthesized by a receiver 

?o (e.g., the lower the similarity, the lower the bit rate and speech quality may be). The original signal, and 
hence the trial original signals, may take the form of actual speech signals or any of the residual or 
excitation signals present in analysis-by-synthesis coders. 

In an illustrative embodiment of the present invention, trial original signals are generated as time-shifted 
versions of an original speech signal segment. Measures of similarity (e.g., cross-correlations) between trial 

/5 original signals and contributions from an adaptive codebook are evaluate A trial original signal which is 
either the same as one of the trial original signals or a variant of an original or trial original signal is 
determined based on one or more evaluated measures of similarity. (In the case of a variant of previously 
generated trial original signals, the determined trial original signal (i.e., the variant) may correspond to a 
time-shift which falls in between time-shifts which produced previously generated trial original signals.) A 

20 signal reflecting a coded representation of the original signal is generated based on the determined trial 
original signal. 

Brief Description of the Drawings 

25 Figure 1 presents a conventional CELP coder. 

Figure 2 presents an illustrative embodiment of the present invention. 

Figure 3 presents windows of samples used in a correlation process estimating open-loop delay. 

Figure 4 presents illustrative time relationships of delay values for use with the embodiment of Figure 2. 

Figure 5 presents an illustrative embodiment of an adaptive codebook processor. 
30 Figures 6a*c present illustrative sample time relationships for operation of the adaptive codebook of the 
embodiment of Figure 2. ' ' 

Figure 7 presents an illustrative embodiment of the time-shift processor of the embodiment of Figure 2. 

Figure 8 presents an illustrative set of initial conditions for the operation of the time-shift processor of 
Figure 7. 

35 Figure 9 presents a flow diagram of the operation of the time-shift processor of Figure 7. 

Figure 10 presents an illustrative segment of original speech used for generating trial original speech 
signals by time-shifting. 

Figure 1 1 presents an alternative embodiment of the invention. 

Figure 12 presents a finite state machine describing the operation of a delay estimator as it concerns 
40 time synchrony between original and time-shifted signals. 

Figure 13 presents an illustrative receiver/decoder for use with the illustrative coder embodiments 
presented in Figure 2 and in Rgure 1 1 . 

Detailed Description 

45 

illustrative Embodiment Klardware 

For clarity of explanation, the illustrative emtx)diment of the present invention is presented as 
comprising individual functional blocks (including functional blocks lat)eled as "processors"). The functions 

60 these blocks represent may be realized through the use of either shared or dedicated hardware, including, 
but not limited to. hardware capable of executing software. For example, the functions of processors 
presented in Figures 5 and 7 may be provided by a single shared processor. (Use of the term "processor" 
should not be construed to refer exclusively to hardware capable of executing software.) 

Illustrative embodiments of the present invention may comprise digital signal processor (DSP) hard- 

55 ware, such as the AT&T DSP16 or DSP32C, read-only memory (ROM) for storing software performing the 
operations discussed below, and random access memory (RAM) for storing DSP results. Very large scale 
integration (VLSI) hardware embodiments, as well as custom VLSI circuitry in combination with a general 
purpose DSP circuit, may also be provided. 
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Introduction to Conventional CELP 

A conventional analysis-by-syntliesis CELP coder is presented In Figure 1. A sampled speech signal, s- 
(i), (where / is the sample index) is provided to a short-term linear prediction filter (STP) 20 of order N, 
5 optimized for a current segment of speech. Signal x{t) is an excitation obtained after filtering with the STP: 



10 



N 

x{i) = 5(0 - j: a, 

11=1 



(1) 



where parameters an are provided by linear prediction analyzer 10. Since A/ is usually about 10 samples (for 
an 8 kHz sampling rate), the excitation signal x(/) generally retains the long-tenm periodicity of the original 
signal. s(i). An LTP 30 is provided to remove this redundancy. 

75 Values for x(i) are usually determined on a biockwise basis. Each block is referred to as a sub frame. 
The linear prediction coefficients. a„. are determined by the analyzer 10 on a frame-by- frame basis, with a 
frame having a fixed duration which is generalJy an integral multiple of subframe durations, and usually 20- 
30 ms In length. Subframe values for a^ are usually determined through interpolation. 

The LTP, typically implemented with an adaptive codebook, determines a gain and a delay d(i) for 

20 use as follows: 



r(i) = x{i) - Hi) xii-dd)), (2) 

25 

where the x(/ 'd{i)) are samples of a speech signal synthesized (or reconstructed) in eariier subframes. 
Thus, the LTP 30 provides the quantity x{i -(/(/)). Signal r(i) is the excitation signal remaining after x- 
{i-d{l)) is subtracted from x(/). Signal r(i) is then coded with a FSCB 40. The FSCB 40 yields an index 
indicating the codebook vector and an associated scaling factor, uli)- Together these quantities provide an 
30 excitation which most closely matches r(i). 

Data representative of each subframe of speech, namely, LTP parameters X(/) and d{i), and the FSCB 
index, are collected for the integer number of subframes equalling a frame (typically 2 to 8). Together with 
the coefficients an, this frame of data is communicated to a CELP decoder where it is used in the 
reconstruction of speech. 

35 A CELP decoder performs the reverse of the coding process discussed above. The FSCB index is 
received by a FSCB of the receiver (sometimes referred to as a synthesizer) and the associated vector e(i) 
(an excitation signal) is retrieved from the codebook. Excitation e(i) is used to excite an inverse LTP process 
(wherein long-term correlations are provided) to yield a quantized equivalent of x{i), x(/). A reconstructed 
speech signal. y{i\ is obtained by filtering x(/) with an inverse STP process (wherein short-term correlations 

40 are provided). 

In general, the reconstructed excitation x(/) can be interpreted as the sum of scaled contributions from 
the adaptive and fixed codebooks. To select the vectors from these codebooks, a perceptually relevant 
error criterion may be used. This can be done by taking advantage of the spectral masking existing in the 
human auditory system. Thus, instead of using the difference between the original and reconstructed 

45 speech signals, this error criterion considers the difference of perceptually weighted signals. 

The perceptual weighting of signals deemphasizes the formants present in speech. In this example, the 
formants are described by an all-pole filter in which spectral deemphasis can be obtained by moving the 
poles inward. This is equivalent to replacing the filter with predictor coefficients di. 82, . .. a^, by a filter with 
coefficients ^di. 7^82. .... y'^an, where 7 is a perceptual weighting factor (usually set to a value around 0.8). 

50 The sampled error signal in the perceptually weighted domain, g{i), is: 



55 



N 

gii) = Jc(i) - x(i) + E 'f^n gH-n) (3) 

n = l 

The error criterion of analysis-by-synthesis coders is formulated on a subframe-by-subframe basis. For a 
subframe length of L samples, a commonly used criterion is: 
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e= L gUY 



(4) 



10 



where / is the first sample of the subframe. Note that this criterion weighs the excitation samples unevenly 
over the subframe; the sample x(7+Z. - 1) affects only g(7+£ - 1). while x(7) affects all samples of g{i) in the 
present subframe. 

The criterion of equation (4) includes the effects of differences in x{j) and x{t) prior to 7, i.e., prior to the 
beginning of the present subframe. it is convenient to define an excitation in the present subframe to 
represent this zero-input response of the weighted synthesis filter: 



IS 



20 



9(0 = 



0, 



n = l 



(5) 



0, 



where z{f) is the zero-input response in the present subframe of the perceptually-weighted synthesis filter 
25 when excited with x(/>x(/) prior to the present subframe. 

In the time-domain, the spectral deemphasts by the factor y results in a quicker attenuation of the 

impulse response of the all-pole fitter. In practice, for a sampling rate of 8 kHz, and y = 0.8. the impulse 

response never has a significant part of its energy beyond 20 samples. 

Because of its fast decay, the impulse response of the all-pole filter 1/(1 - ya^2^^ ... - y^SNZ'^oau be 
30 approximated by a finite-impulse-response filter. Let /)o, /)i /Jr-i denote the impulse response of the 

latter filter. This allows vector notation for the error criterion operating on the perceptually-weighted speech. 

Because the coders operate on a subframe-by-subframe basis, it is convenient to define vectors with the 

length of the subframe in samples. L For example » for the excitation signal: 



35 



^(0 = [jt(i) jc(i + l) jc(i+L-l)] 



(6) 



40 Further, the spectral-weightirtg matrix H is defined as: 



45 



60 
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H 



ho 
hi 



0 
ho 



hR.\ hR-i 

0 



0 



ho 
hi 



(7) 



R-2 
hR-i 



H has dimensions {L*R- 1)xt. Thus, the vector 
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HHi) 



approximates the entire response of the IIR filter 1/(1 - 7812^^ ... - y'^SnZ'^ to the vector x{i). With these 
5 definitions an appropriate perceptually-weighted criterion is: 



With the current definition of Hthe error criterion of equation (8) is of the autocorrelation type (note that 9fH 
is Toepiitz). If the nDatrix H is truncated to be square LxL, equation (8) equals equation (4), which is the 
more common covariance criterion, as used in the original CELP. 

15 

An Illustrative Embodiment for CELP Coding 

Figure 2 presents an illustrative embodiment of the present invention as it may be applied to CELP 

coding. A speech signal in digital form, $(/), is presented for coding. Signal is provided to a conventional 
20 linear predictive analyzer 100 which produces linear predictive coefficients, a^. Signal s{i) is also provided 

to a conventional linear prediction filter (or "short-term predictor" (STP)) 120, which operates according to a 

process described by Eq. (1), and to a conventional delay estimator 140. 

Delay estimator 140 operates to provide an estimated delay value to the adaptive codebook processor 

150. To determine delay information valid at a particular sample time, delay estimator 140 performs 
25 conventional correlation of a window of samples of s(tX centered about the particular sample in question. 

with each of a multiplicity of windows of the same length. The windows involved in this correlation are 

illustrated in Figure 3. 

Figure 3 presents the demarcations for frames {F) and constituent subframes (SF) of samples of signal 
s{i) (actual sample values of s(/) have been omitted for clarity). Shown are three frames, F^-, (the past 
30 frame), F„ (the current frame, and F„+i(the nexf frame). Each of these frames comprises 160 samples of 
signal s{i). 

The location of frame boundaries is provided by time shift processor 200 discussed below. Time shift 
processor 200 provides a sample location dp V indicating the end of a subframe of original speech signal. 
s(/). Delay estimator 140 simply keeps track of the subframe boundaries of original speech to know when a 

35 frame boundary is reached (such a frame boundary is at an integral multiple of subframe boundaries). 
Because delay estimator 140 operates on a frame of speech prior to the operation of the time shift 
processor 200 on the same frame of speech, delay estimator 140 must predict the position of future frame 
boundaries. It does this by adding a fixed number of samples equal to a frame length {e.g., 160 samples) to 
the last frame boundary provided by the time shift processor 200. 

40 Assume delay estimator 140 is to determine a value for delay, M, valid at the boundary between the 
current and next frames of s(/), M{FB„+^), To do this, estimator 140 stores in Its memory a window of 160 
signal samples surrounding this boundary (estimator 140 must wait to receive samples of signal s(/) valid in 
the next frame). This window of samples is denoted as window A. Next, estimator 140 performs a 
correlation computation with samples of s(/) in window Si - the first of 140 other windows of s(/). Window 

45 B^ is a window of 160 samples beginning 20 samples earlier than the beginning of window A and ending 20 
samples earlier than the end of window A. A correlation value associated with window 81 is stored in 
memory. The correlation process is repeated with window B2, a 160 sample window beginning one sample 
earlier In time than window Si. Correlation computations are performed for each of the next 138 windows, 
each window distinct from the one before by one sample. 

50 As shown In Figure 3, estimator 140 must have enough memory to store what Is essentially two frames 
of signal samples. If D is the largest delay value allowed, then the memory should extend D samples prior 
to the beginning of window A. When D=160, in order to compute an estimated delay valid at FSn+i, 
estimator 140 must store samples of s(/) from the beginning of the third subframe, SF2, of frame Fn-^ to the 
end of the second subframe, SFi, of frame Fn+i. Delay. Af. is determined by estimator 140 based on the S 

55 window of samples having the greatest correlation with the samples of window A. That is. delay is equal to 
the number of samples that the most correlated S window Is shifted In time from window A, 

The delay estimator 140 determines a frame boundary delay estimate. M, once per frame. Delay 
estimator 140 further determines a delay value, m, valid at a fixed number of samples into each subframe 




[xH) + fl(0 - 5(o) • 
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(e.g., 10 samples), by conventional linear interpolation of delay values valid at frame boundaries. For this 
purpose, the delay value required at 10 samples into the next frame is set equal to the delay value at the 
frame boundary. 

The timing associated with the delay values provided by delay estimator 140 is illustrated in Figure 4. 

5 As shown in the Rgure, delay values valid at the frame boundaries surrounding frame n are M{FB„) and M- 
(FS„+i). Delay values valid at a fixed number of samples past each subframe boundary (SB) within frame n 
are Indicated as /n„(/c), A = 0,1 ,2,3. These values of rrtnik) are determined by interpolation as discussed 
above. Delay values /77„ (k) are provided to the adaptive codebook processor 150. As will be discussed 
below, the adaptive codebook processor 150 uses this delay information to provide an adaptive codebook 

10 contribution to the time shift processor 200. 

The Adaptive Codebook Processor 

The adaptive codebook processor 150 provides an estimate of a current subframe of speech (to be 
T5 coded) to the time shift processor 200 based on delay estimates, m„(/f). from the delay estimator 140 and 
past reconstructed speech signals from the CELP process. The adaptive codebook processor 150 operates 
by using delay values, frt„(k}, to determine a delay pointer. cf(/), to past reconstructed speech signals stored 
in the memory of processor 150. Selected past speech samples, x(/), are then provided to processor 200 as 
an estimate of the current subframe of speech to be coded. For each subframe of original speech to be 
20 coded, adaptive codebook processor 150 provides a corresponding subframe of speech samples pfus a 
fixed number of extra samples which extend into the next subframe. Illustratively, this fixed number of extra 
samples equals 10. 

Figure 5 presents an illustrative realization of the adaptive codebook processor 150. The realization 
comprises processor 155 and RAM 157. Processor 155 receives past reconstructed speech signals, x(/), 

25 and stores them In RAM 157 for use in computing current and next subframe speech samples. Processor 
155 also receives delay values. /n„(/f), from delay estimator 140 which are used in the computation of such 
sample values. Processor 155 provides such computed sample values, x(/), to time shift processor 200 for 
use in the generation of trial original signals. 

Each sample of speech provided to the time shift processor 200 is determined as follows. Rrst, a delay 

30 pointer, cf(/), valid for the sample in question (that is, the sample to be provided to the time shift processor 
200) is determined by processor 155. This is done by interpolating between a pair of delay values, m„{k) 
(provided by delay estimator 140), which surround the sample in question. The interpolation procedure used 
by processor 155 to provide the delay pointers, d{i\ is conventional linear interpolation of the provided 
delay values, /n„(/f). Next, processor 155 uses the delay pointer. d{i) (valid for the sample In question), as a 

35 pointer backward in time to an earlier speech sample which Is to be used in the current frame as the value 
of the sample in question. Such earlier samples are stored in RAM 157. In general, the delay pointer, £/(/). 
will not point exactly to a past sample. Instead, {/(/) will likely point somewhere between consecutive past 
samples. Under such circumstances, processor 155 interpolates past samples to determine a past sample 
value valid at the moment in time to which the delay pointer refers. The Interpolation technique used by 

40 processor 155 to determine past sample values is conventional bandlimlted Interpolation, such as that 
described by Rabiner and Schafer, Digital Processing of Speech Signafs, pp. 26-31 (1978). The 
interpolation filter realized by processor 155 illustratively employs 20 taps on either side of the past sample 
closest to the time indicated by the delay value. 

Figures 6a-c illustrate the process by which the adaptive codebook processor 150 selects past samples 

45 for use in a current (and next) subframe. For clarity of presentation, Figures 6a-c assume that a computed 
value of d{i) points exactly to a past sample value, rather to a point in between past sample values. Also, it 
will be assumed without loss of generality that the delay values are shorter than the subframe length. 

As shown in Figure 6a, the samples to be provided to time shift processor 200 include samples In a 
current subframe and a fixed number of samples in the next subframe. Processor 155 receives a delay 

50 value for the current subframe, mcum from the delay estimator 140 and has stored in its memory 157 a 
delay value for the previous subframe, rDprev. To determine the value of each sample. x(/), of the current 
subframe located prior to the point at which nfcun is valid, processor 155 determines a delay pointer, tf(/). 
valid at the sample time / of the sample In question. This is done by linear Interpolation to the point in time 
when the sample Is valid using delay mcum and the last delay value received from estimator 140, mprev- 

55 After this delay pointer, d{i), has been determined, processor 155 computes by bandlimlted Interpolation of 
samples in its memory ^57 the sample value valid at a point in time which is d{i) samples prior to the 
sample in question, /.e., x(^c/(/)). This sample value is then inserted into a memory location reserved for the 
current subframe sample in question. 
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In the example of Figure 6. the subframe length is longer than the delay values. The process by which 
a given sample in the current subframe is determined is based on determining a delay pointer and locking 
backward In time for a sample value to use as the given sample value. Thus, segments of reconstructed 
speech may be essentially repeated using bandllmited interpolation within the current subframe. So, for 

5 example, In Figure 6b, a given sample, x(/), takes its value from a previously determined sample which 
precedes it in time by a delay i.e., x(hd(i)). This delay is determined as described above, except the 
delay values which are interpolated are the delays from the current subframe, ntcurr, and the next subframe, 
^next> since these delays surround sample x{i). Repeating signal segments with constant gain when the 
delay is shorter than the subframe length is what distinguishes the adaptive codebook procedure from LTP 

10 filtering procedures. 

As shown in Frgure 6c. the extra samples in the next subframe are determined in the same fashion as 
those in Figure 6b. In this case, samples from the current subframe are used to provide values for samples 

in the next subframe. 

In practice, the above-described procedure of the adaptive codebook processor 150 may be realized by 

75 first computing all delay pointer values, d{i) for all sample times of the current and portion of the next 
subframe in question. Then, for each sample time, /, of the present or next subframe needing a sample 
value, d(i) is used as a reference to a past time, at which a sample is "located." In general, there will 
not be a sample located at time Therefore, bandlimited interpolation of samples surrounding time hd- 
(i) will he required- Once the bandlimited interpolation is performed generating a sample value at hd{i), that 

20 sample value is assigned to time /. This process may be repeated in a recursive process for each sample in 
the present or next subframe as needs 

Once the adaptive codebook processor 150 has determined samples for use in the current subframe 
and a fixed portion of the next subframe, those samples are provided to the time shift processor 200 for use 
as a basis for determining a shifted original signal for use in a CELP coding process. The samples provided 

25 to the time shift processor are referred to as the adaptive codebook contribution to the analysis-by- 
synthesis process of CELP coding. 

It should be understood that an all-pole filter may be used in place of the adaptive codebook realization 
of an LTP. However, the adaptive codebook realization Is particularly well suited to situations where, as 
illustrated here, delay values are generally less than the length of a subframe. This is t>ecause a adaptive 

30 codebook realization does not require a determined value of LTP gain (here, codebook gain) simply to 
provide an LTP contribution in the current subframe. This gain may be determined later. Unlike the case of 
the adaptive codebook, an all-pole filter realization of an LTP requires the solution of a nonlinear equation to 
obtain a value for the filter gain when delay Is less than subframe length. 

35 The Time-Shift Processor 

The time shift processor 200 determines how to shift segments of an original speech signal such that It 
may be coded (by an analysis-by-synthesis coding process, such as CELP) with less error than if the 
original signal was always used for coding. To time-shift an original speech signal, the time shift processor 

40 200 first identifies within the original speech signal a local maximum of original speech signal energy. In the 
illustrative embodiment described below, processor 200 selects a plurality of overlapping segments of the 
original speech signal, each of which includes the identified local maximum signal energy. Processor 200 
compares each selected segment with a segment of the adaptive codebook contribution (provided by the 
adaptive code book processor 150). This comparison is made to determine the original speech signal 

45 segment which most closely matches the segment of the adaptive codebook contribution. When the 
segment of the original speech signal which best matches the segment of the adaptive codebook 
contribution is determined, this segment of original speech is used in the formation of a shifted original 
speech signal for coding by a CELP process. 

As shown in Figure 2, the time shift processor 200 receives an original residual speech signal. x(/). from 

50 the STP 120, and provides a time shifted residual, x(/), for use in the CELP coding process. As shown in 
Figure 7, time shift processor 200 illustratively comprises processor 210; conventional buffer memories 220. 
230, and 240; conventional ROM 250 for the storage of processor 210 programs; and conventional RAM 
260 for the storage of processor 21 0 results. 

The operation of time shift processor 200 will be explained with reference to Figure 8, which presents 

55 an illustrative starting point for processor 200 operation on speech signals, and Figure 9, which presents an 
illustrative flow diagram for the operation of processor 210. 

As shown In Figure 8, processor 200 begins operation having received a buffer 220 of reconstructed 
speech representing the adaptive codebook contribution from the adaptive codebook processor 150. As 
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discussed above, this adaptive codebook contribution comprises samples of past reconstructed speech 
which have been mapped into the current subframe and a fixed portion of the next subframe {see Figure 6 
and associated discussion) by processor 150. This buffer of reconstructed speech is loaded into RAM 260 
for use by processor 210. A pointer, dp ^, \s maintained by processor 210 and stored in RAM 260 to 
indicate the end of the latest subframe for which both the adaptive codebook and FSCB contributions have 

been determined. The length of such subframes, subframe /, is constant and maintained in memory, e.g., 

ROM 250. Based on prior operation of the processor 210, a time shifted residual, Jc(/), has been created up 
to a point in time identified by a pointer dpm (pointer dpm is always greater than or equal to dp 1). 
Moreover, a portion of the original residual signal, Including that associated with the current subframe. 
has been received by buffer 230 and stored in RAM 260. Processor 210 maintains (in RAM 260) a value, 
acc^shift, representing the sample displacement (or accumulated shift) between the last sample in the 
shifted residual signal and a corresponding sample in the original residual speech signal. (At initialization, 
the above-described status is modified to include dpm =^ dpA and acc shift = 0). 

Given this set of conditions, the time shift processor 200 operates to determine a shifted residual signal 
for the current subframe (and possibly a portion of the next subframe, depending on the circumstances) 
which best matches the adaptive codebook contribution. 

Figure 9 presents a flow-diagram illustrating the operation of the processor 210 of Figure 7. According 
to Figure 9, the first task performed by processor 210 is to determine whether the time shifted residual, x(/), 
has been extended up to or beyond the end of the current subframe. As shown in Hgure 8, the extent to 
which the time shifted residual has been extended is given by pointer dpm. The end of the current 
subframe is indicated by the sum of current subframe pointer dp 1 and the fixed subframe length, 

subframe /. If dpm < dp t + subframe / further processing is performed to extend the shifted residual; 

else, no further shift processing is required for the current subframe (see step 305). 

If further shift processing is required, processor 210 determines the location of maximum energy in a 
segment of the original residual speech signal, x(/) {see steps 31 0-375). Ordinarily, the location of maximum 
energy corresponds to the location of a pitch-pulse of voiced speech. However, this is not necessarily the 
case. Regardless of whether the maximum energy is associated with a pitch-pulse or some other signal 
feature (such as. e.g., energetic noise), the search for the maximum energy location is made so that shifts 
in the original signal will be made to best align an energetically significant feature in the original speech 
with a significant feature in the adaptive codebook contribution. 

The beginning of the segment of the original residual speech signal to be searched is defined with 
respect to a pointer to an original residual speech signal sample. This sample corresponds to the sample 
identified by pointer dpm in the shifted residual signal. This residual speech signal sample pointer, dpm\ is 
determined as the sum of sample pointer dpm and the accumulated shift between x{i) and x{f): 
dpm* = dpm + acc_shift (see step 310). The beginning of the interval to be searched, designated by the 
pointer offset, is then computed (see step 315). Next, the length of the Interval to be searched is defined 
{see step 320). 

The location of maximum energy in the segment of x{i) is then determined (see step 325). This 
determination is made with use of a five-sample window. This window, centered about the hh sample of the 
original residual speech signal, defines samples of the original residual used in an energy computation. The 
energy at sample location / is determined by the sum of the squares of the samples in the window. The 
energy at the (/ + 1)th sample location is determined in the same fashion, but with the window moved one 
sample later in time such that the center window location now contains the (/ + 1)th sample. Again, the 
energy is determined as the sum of the squares of the sample values in the window. The energy of each 
sample location in the segment Is determined in the same fashion. The energy of samples In a current 
window may be determined as the energy of an immediate pasf window of samples minus the energy of 
the sample shifted out of the window pius the energy of the sample shifted into the window. The sample 
location having associated with it the maximum energy determined in this fashion is identified by a pointer 
location. 

Once the segment of the original residual signal, x(i), has been searched for the sample having the 
maximum energy in the segment, processor 210 determines if this maximum energy sample is one which 
has been considered in the previous subframe (and thus not a maximum of interest). This is done by 
determining whether location precedes dpm' (see step 330). 

If location precedes dpm\ another search is performed by processor 210. In this case, however, the 
segment searched begins at a sample specified as offset ~ location + 0.75 delay (see step 335), and is 
of duration 0. 5 delay. The value delay is provided by delay estimator 140 as the delay valid at the 
beginning of the current subframe. M(FB„). Since significant pitch-pulse energy features in the original 
residual signal are likely separated by one delay period, the computation of a new offset allows the search 
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to skip ahead (0.75 delays and likely find a maximum energy feature within a segment of length 0.5 delay. 
The sample location of maximum energy is determined as described above with reference to step 325 (see 
step 345). 

If location does not proceed dpm\ then the first pitch-pulse beyond dpnf has likely been found, and 
6 the flow of control jumps to step 350. 

If the location of maximum signal energy determined at either steps 325 or 345 follows dpm* + delay, 
then It Is likely, but not certain, that a pitch-pulse located subsequent to dpm" but prior to dpnf + delay 
has been missed by the searches performed to this time by processor 210 (see step 350). In this case, 
another segment of the original residual signal is defined and the location of the maximum energy therein is 
10 determined. If the location of maximum signal energy determined at either steps 325 or 345 precedes dpm* 
+ delay, then the flow of control jumps to step 380. 

Assuming step 350 results in the need to search another segment of the original residual speech signal, 
this segment is determined to begin at offset = location - 1.25 delay {see step 355) and extend for 
length = 0.5 delay {see step 360). The location of the maximum energy is determined as described above 
75 with reference to step 325, but the sample pointer to this location is saved as location 2 {see step 365). 

If the location of maximum energy {location 2) Is subsequent to dpm\ then location 2 identifies the 
location of the first pitch-pulse beyond dpm\ and location is set equal to location 2 (see steps 370 and 
375). If, on the other hand, the location of maximum energy Is not beyond dpm\ then location 2 is not the 
first pitch-pulse beyond dpm\ and location remains set to the value it was assigned at either step 325 or 
20 345 (since under such circumstances, pointer location is not overwritten by the operation of step 365). 

At this point, the location of the first pitch-pulse (or energy maximum) in a segment of the original 
residual has been found. Now, a segment of the original residual signal containing this location will be 
defined by processor 210 through the setting of certain pointers to samples In the signal. These pointers 
specify the beginning (s / start) and end (s / end} of this segment containing the determined location. This 
25 segment Is defined for later use as part of the process of aligning (or shifting) original residual speech to 
best match an adaptive codet>ook contribution. 

First, default values for the segment pointers are set by processor 210. Pointer s f Start is set equal to 

dpm\ the sample location corresponding to dpm + acc shift (see step 380). This value for s / start 

corresponds to an additional accumulated shift between x{i) and x{i ) of zero. That Is, use of a section of x- 
30 (/) beginning at dpm* {= s f start) adds nothing to the accumulated shift between the original and shifted 
residual signals. 

Pointer s / end is set to location + extra. The value extra is a constant stored in memory {e.g., ROM 
250) and is equal to a fixed number of samples, e.g., 10 samples. Use of extra guarantees that the pitch- 
pulse (or maximum energy) of original residual speech will not fall at the end of the segment of the original 
35 residual being Identified by these pointers (see step 380). 

The default value of pointer s / end may be overwritten under certain circumstances. If the default 
value of s / end would mean that the segment of original residual speech would extend significantly beyond 

the end of the adaptive codebook contribution, the pointer s / end Is set to end at dp 1* + subframe / + 

extra, where subframe_l is a constant equalling the number of samples in a fixed adaptive codebook 
40 subframe as discussed above (see steps 385 and 390). 

The value of s / end may be further overwritten if the location of the identified pitch-pulse (or major 
energy) Is significantly beyond the end of the adaptive codebook subframe. Under such circumstances the 
segment is deemed to end at the end of the adaptive codebook subframe boundary (see steps 395 and 
400). Note that such a definition of s ^ end means that the location of the pitch-pulse (or major energy) is 
45 later than the end of the segment. Therefore, the segment no longer contains the pitch-pulse. 

At this point, the location of the identified pitch-pulse (or maximum energy) is checked to determine 
whether it falls outside a range of samples beginning at s f start and ending at s ^ end - 1 (see steps 405). 
If so, x{i) may be extended with samples obtained with bandlimited interpolation of x{i) without need for 
changing acc_shift (that is, flow of control may jump to step 480). Otherwise, shifting is performed (see 
60 step 410-475). 

Assuming the location of the identified pitch-pulse (or major energy) is not outside the range defined 
above, a set (or segment) of L samples of x{i) (within a specified range of samples about the segment 
defined by s / start and s f end) which most closely matches an L-length section of the adaptive 
codebook contribution (which begins at dpm and ends at dpm-^L) is determined by processor 210. 
55 This L-length segment of x{i) may comprise those L samples of the segment of x{i) defined by s / start 
and s / end, but may also comprise samples (obtained by bandlimited interpolation) of a segment which is 
shifted with respect to s A start and s / end, depending upon how closely a given L-length segment of x(/) 
matches the L-length section of x(i). As predicates to this determination, a limit on the range of possible 
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sample shifts (see step 410) and a sample length, L, are determined (see step 415). The determination of 
the "closeness" {i.e., a measure of similarity) between L-length segments of x(/) and the adaptive codebook 
contribution x(/) is made through a cross-correlation process of these signals (see step 425) (it will be 
understood that other measures of similarity, such as a difference or error signal may also be used). The 
5 selection of /.-length segments of x{i) for use in a cross-correlation with a segment of x{t) may be 
advantageously described with reference to Figure 10. 

Figure 10 presents an illustrative segment of original residual speech signal x{i) which was located as 
described previously with reference to steps 310-400. The segment begins at sample s / start and ends at 
sample s f end. The pitch-pulse Is at sample location, with the distance between samples location and s / 

70 end equal to extra. As discussed above, the samples of x(i) falling within the segment defined by pointers 
s f start and s f end correspond to a shift of zero. Shifted segments of x(f) are defined with respect to this 
zero shift position. Each shifted segment is of length L and begins (and ends) a certain positive or negative 
number of sample lengths (or fractions of sample lengths) with respect to the zero shift position. Expressed 
another way, each shifted segment begins at s / start + shift and ends at s f end + shift. As shown in 

75 Figure 10, the range of possible shifts values for shift is ±limit. 

So for example, one possible shift would be shift = - limit. In this case, the L-length segment of x(l) 
defined by such a shift would begin at location s f start - limit and end at location s / end - limit. 
Similariy, another possible shift would be shift = + limit. In this case, the L-length segment of x(/) defined 
by such a shift would begin at location s / start + limit and end at location s f end + limit. As mentioned 

20 above, ±limit specifies a range of possible shifts. Therefore, shift may take on values in the range - 
limit^shift^ + limit, given a shift step size {i.e., shift precision) of sstep. Step size sstep may be set 
illustratively to 0.5 samples. Sample values resulting from fractional shifts are determined by conventional 
bandlimited interpolation. A plurality of 2 x Urn it/ sstep segments of the original residual signal x(/) may be 
defined in this way. All are /.-length segments between ± limit, wherein each segment overlaps Its neighbor 

25 segments and is distinct from its nearest neighbor segments by sstep samples. 

The relative sizes of limit and extra have an effect on system performance. For example, as extra is 
made larger, greater coding delay is introduced to the system. As extra Is made smaller, coding delay is 
reduced, but the probability that shift will take on a value which excludes a pitch-pulse from the L-length 
segment of x(/) increases. This exclusion, when it occurs, causes audible distortion in the speech signal. 

30 The probability of exclusion is also increased as limit is made larger. To help insure that exclusion does not 
occur, the value of limit should be less than the value of extra. For example, if the value of extra is 10, 
limit may be set to 6. 

For each such /.-length segment of x(/) thus identified, a measure of similarity between the segment 
and an /.-length segment of the adaptive codebook contribution, x{i), is computed. This computation is 

35 illustratively a cross-correlation. The adaptive codebook segment used for each cross-correlation begins at 
dpm and ends at dpm + L (see Figure 8). The cross-correlation is performed with a step size equal to sstep 
(should ssfep equal a non-integer value, conventional bandlimited interpolation of x{i) is performed in 
advance to provide the requisite sample values for the segments of x{i) and x(f)). Each cross-correlation 
results in a cross-correlation value {i.e., the measure of similarity). All such cross-correlations form a set of 

40 cross-correlation values separated in time by sstep. Each cross-correlation value of the set is associated, 
therefore, with a shift corresponding to the L-length segment of x{i) used in the computation of that value. 

Once the set of cross-correlation values is determined, the segment of the original residual signal 
having the greatest cross-correlation with the adaptive codebook segment is determined with an increased 
time resolution (see step 450). Illustratively, this is done by determining a second order polynomial curve 

45 for each set of three consecutive cross-correlation values (a set of three values is distinct from its nearest 
neighboring sets by one value). The middle value of these three cross-correlation values in a set 
corresponds to a shifted original residual signal as described above. The set of three cross-correlation 
values, and thus the associated polynomial curve, is identified by this middle value and its associated shift. 
For each such curve, a maximum and the location of that maximum {toc^max) is determined. (If loc_max 

50 is outside the range of the three values, the three values and associated curves are disregarded.) The curve 
having the greatest maximum value identifies the shift of the original residual signal which produces the 
best match with the segment of the adaptive codebook contribution. 

The shift of the original residual signal producing the best match is refined with knowledge of the 
location of the maximum of the polynomial curve having the greatest maximum. With the location of the 

55 maximum defined with respect to the location of the middle of the three cross-correlation values associated 
with the curve {i.e., a value of shift), shift may be refined as shift = shift + sstep ' loc^jnax. 

At this point, the best shift of the original residual signal has been determined. This shift may then be 
used to extend the shifted residual, x(/) for a duration L. Since this shift is known, the accumulated shift 
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between the original residual sigrial, x(/), and the shifted residual signal, x(f) may be updated as acc^shift 
= acc_$hift + shift (see step 475). 

With the accumulated shift updated, the shifted residual signal. x(/), is extended to match acc- shift 
with use of the segment of the original residual signal corresponding to shift. Note that original residual 
sample values are available only at original signal sample times. However, in determining an optimal shift of 
the original residual signal, an upsampling has been performed prior to computing cross-correlations and a 
value ioc_max (which is generally noninteger) has been determined. In general this results in a noninteger 
sample time relationship between the shifted residual signal x{i) and the original residual signal x{i) to be 
used in extending the shifted residual signal. Therefore, bandlimited interpolation of the /.-length segment of 
the original signal is used to provide sample values of the original signal which are time-aligned with 
samples of the shifted residual. Once such time-alignment is performed, the samples of this time-aligned 
signal may be concatenated with the existing shifted residual signal (see step 480). 

Note that flow of control may have jumped to step 480 without updating the accumulated shift. In this 
case, a length of L-samples of the original signal is interpolated to provide samples for the shifted residual 
with the same value of acc^shift as the previous shifted residual segment. 

In either case, dpm is updated to reflect the extension of x{i) {see step 490). 

As shown in Figure 9. once dpm is updated, the flow of control returns to step 305. As mentioned 
above, step 305 determines whether further processing is required to extend the shifted residual beyond the 
end of the current subframe. If so. control flows through the process presented in steps 310-490 of Figure 9 
again so that further extension of the shifted residual may be performed. Steps 310-490 are repeated as 
long as the condition of step 305 is satisfied- Once the shifted residual has been extended up to or beyond 
the end of the current adaptive codebook subframe. the pointer to the end of the adaptive codebook 
subframe is updated (see step 500) and processing associated with time-shifting the original residual ends. 

Once x(i) is determined by lime shift processor 200, a scale factor X(/) is determined by process 210 as 
follows: 



X(o = , (13) 

where x{i) and x{i) are signals of length equal to a subframe. This scale factor is muftiplied by x(i) and 
provided as output from processor 200. 

Referring again to Figure 2, x{i ) and adaptive codebook estimate \{i)x{i) are supplied to circuit 160 
which subtracts estimate X(/)x(/) from modified original x{i). The result is excitation residual signal r{i) which 
is supplied to a fixed stochastic codebook search processor 170. 

Codebook search processor 170 operates conventionally to determine which of the fixed stochastic 
codebook vectors. ^(/). scaled by a factor. u(/). most closely matches r{i) in a least squares, perceptually 
weighted sense. The^ chosen scaled fixed codebook vector, u(i)Zmjn(/), is added to the scaled adaptive 
codebook vector, \{i)x (/), to yield the best estimate of a current reconstructed speech signal, x(i). This best 
estimate, x(/),is stored by the adaptive codebook processor 150 in its memory. 

As is the case with conventional speech coders, adaptive codebook delay and scale factor values, X 
and M, a FSCB index, Ifc» and gain, u(/). and linear prediction coefficients, a^, are communicated across a 
channel for reconstruction by a conventional CELP decoder/receiver (see Figure 13), This communication is 
in the form of a signal reflecting these parameters. Because of the reduced error (in the coding process) 
afforded by operation of the illustrative embodiment of the present invention. It is possible to transmft 
adaptive codebook delay information. M, once per frame, rather than once per subframe. Subframe values 
for delay may be provided at the receiver by interpolating the delay values in a fashion identical to that 
done by delay estimator 140 of the transmitter. 

By transmifting adaptive codebook delay information M every frame rather than every subframe. the 
bandwidth requirements associated with delay may be significantly reduced. 

As discussed above with reference to step 475 of Figure 9. acc^shift represents an accumulated shift 
over time between the original signal. and the shifted signal, x{i). In order to prevent an ever increasing 
asynchrony between these signals, the delay estimator 140 can adjust computed values for A/over time. An 
adjustment process suitable for this purpose carried out by estimator 140 is advantageously described with 
reference to Figure 12. 

Figure 12 presents a finite-state machine having states A, B and C. The state of this machine 
represents an amount of adjustment to computed values for M to prevent ever increasing asynchrony. 
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Transitions between states are based on values for acc_$hift provided by time shift processor 200. When 
the machine is in state A, the delay value W(F6„+i) used to determine values for delays m„{k) is not 
adjusted. When in state B. the machine adjusts A^(FB„+i) as follows: M(F0„+i) = W(FB„+,) + 6, where a 
illustratively equals one sample time. When in state C, the machine adjusts M(FBn+^) as follows: M{FBn^^) 

Given an initial state (A, B. or C), the finite state machine operates by keeping track of values of 

acc_shift If the value of acc shift is such that a condition for transitioning between the current state and 

another state is met, a transition to the other state occurs. For example, assuming the machine is in state A 
(an illustrative initial state for estimator 140) and - 3ms<acc_shift<3ms, the machine would remain in state 
A and M{FB„+^) would not be modified. If the value of acc_shift exceeds 3ms. the machine transitions to 
state C and M{FBn+^) is incremented by one sample time to help offset the asynchrony indicated by 
acc^shift If, on the other hand, when in state A acc_shift becomes less than -3ms, the machine 
transitions to state B and M{FBn+%) is decremented by one sample to help offset the asynchrony. The 
operation is similar for states B and C. 

An Alternative Illustrative Embodiment 

One alternative to the illustrative embodiment presented in Figure 2 is presented in Figure 11. In this 
embodiment, a trial signal generator 610 receives an original digital speech signal, x{i), and generates a 
plurality of trial original signals, x{0. The trial original signal generator 610 comprises a time-shift processor, 
similar to that presented in Figures 2,7, and 9. but which does not perform a correlation between a trial 
original signal and an adaptive codebook contribution. Rather, this time shift processor simply provides a 
plurality of i-length trial original signals based on a plurality of shifts of original speech signal x(/). As 
discussed above with reference to Figure 10, these trial original signals are /.-length segments of the 
original signal determined by shifts of step size sstep over a range of tlimit with respect to an L-length 
segment beginning at sample s / start and ending at sample s f end. Because it performs no cross- 
correlation between the original residual and trial original signals^ generator 610 does not select a trial 
original signal for coding on Its own. Rather It provides the trial original signals, x(/). It generates to a 
coder/synthesizer 620 for processing. 

Coder/synthesizer 620 comprises a conventional analysis-by-synthesis coder, such as the conventional 
CELP coder presented in Figure 1 . The synthesized (or reconstructed) original signal, x(/). is that shown in 
Figure 1 as the sum of the adaptive and fixed codebook output signals, e(/) + X(/)x(/ -c/(/)) (see circuit 45 of 
Figure 1). The coded signal parameters determined by the analysis processing of the CELP coder (from 
which the synthesized signal x(/) is generated) may be saved in RAM for later use. The output of the 
coder/synthesizer 620,x(/), is thus an estimate of the original signal. x{i), based on a given trial original 
signal, x(i). This estimate of the original signal is thereafter compared with the trial original signal to 
determine a measure of the similarity between the estimated original, x(/), and the trial original, x(i). This 
measure similarity Is provided to a subtraction circuit 630, which determines a difference (or error) signal, F* 
(0, between the two signals. The error signal £(/) is provided to the trial signal generator 610 which keeps 
track of the error associated with a given trial original signal. Once all trial original signals have been 
processed in this way, the trial signal generator may determine which trial signal, produced the best 
measure of similarity (e.g., the smallest error). Thereafter, generator 610 may signal the coder/synthesizer 
620 to use the saved code parameters associated with the trial original signal having the smallest error. 
These parameters may be communicated to a receiver as a coded representation of the original signal, x(i). 

It will be understood by those of ordinary skill in the art that reference to signals such as the "original" 
signal, "reconstructed" signal, etc, may include reference to segments thereof. Moreover, whether a given 
signal is upsampled or not does not change Its character as an original* signal, a "trial original" signal, etc. 
Hence, use of the term "samples" with reference to, e.g., an "original signal" may include those sample 
values of the signal provided by an upsampling technique (such as conventional bandlimited interpolation), 
those samples which are not the result of upsampling, or both. 

Introduction to Appendix 

Attached as an appendix hereto is an illustrative set of software programs related to the fist illustrative 
embodiment discussed above. The software programs of this set are written in the "C" programming 
language. An embodiment of this Invention may be provided by executing these programs on a general 
purpose computer, for example, the Iris Indigo work station marketed by Silicon Graphics, Inc. Note that 
subroutines "cshiftframe" and "modifyorig" correspond generally to those functions presented in Figure 9. 
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70 



75 



20 



25 



# include "macrch- 
/* 

• mod - modify residual 
V 

void mod( residualm, aces 
dpi 



float *residualm; / 

float *accshift; / 

float *d_shift; / 

float shiftr; / 

float *exctation; / 

float * residual; / 

int dpi; / 

int *dpm; / 

float ♦Ipcw; / 

int Ipcorder; / 

float delay; / 

int subframel; / 

int extra; / 
long fcnt; 
( 

void cshif tf rame ( ) ; 
void modif yorig ( ) ; 
float shiftr2; 
int sfstart, sfend; 



hift, d_shift, shiftr, exctation, residual/ 
dpm, Ipcw, Ipcorder, delay, subframel, extra, fcnt) 

output: modified residual signal */ 

output: shift from mresidual to residual .*/ 

output: local shift for all samples */ 

input: maximum shift range */ 

input: adaptive codebook excitation */ 

input: original residual */ 

input: pointer to output signals */ 

in/out: pointer to end of residualm */ 

input: weigted Ipc coefficients */ 

input: Ipc order */ 

input: delay */ 

input: subframe length */ 

input: additional exctation constructed */ 



30 



while ( *dpin < dpl+subframel) { 

cshif tf rame ( &sf start, &sfend, &shiftr2, *dpm, residual, dpi, 

*accshift/ shiftr, delay, subframel, extra, fcnt); 
modif yorig ( residualm, accshift, d_shift# dpm, shiftr2, exctation, 
residual, sf start, sfend) ; 

) 



35 



40 



45 



50 



55 
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#includ€ "macro. h*" 
/* 

• cshiftframe - find optimal frame shift 



void cshiftframe ( 


sf start , 


sf end. 


maxshif t2, dpm, residual, dpi, accshift. 




maxshif t. 


delay. 


subframel, extra, fcnt) 


inc *sfstart; 


/* 


output : 


shift-frame start */ 


int *sfend; 


/* 


output : 


shift-frame ending */ 


float *maxshift2; 


/* 


output : 


one-sided shift range */ 


int dpm; 


/* 


output : 


up to where residualm exists */ 


float * residual; 


/* 


input : 


original residual signal */ 


int dpi; 


/* 


input : 


output signal pointer */ 


float accshift; 


/* 


input : 


shift of output versus input */ 


float maxshift; 


/* 


input : 


maximum shift range */ 


float delay; 


/* 


input : 


local pitch value */ 


int subframel; 


/* 


input : 


subframe length */ 


int extra; 


/* 


input : 


additional excitation beyond current frme 


long font; 
I 


/* 


input : 


frame counter (DEBUG) */ 


void maxelocO 




/* 


determine location of max energy *. i 



float maxener; 



int offset; 

int iacshift; 

int length; ^ 

int loc, loc2; 

if { delay < 0) { 

iacshift = -accshift + 0.5; 
iacshift = -iacshift; 

) 

else 

iacshift = -accshift + 0.5; 

/* determine first a pitch pulse somewhere near dpm */ 
length = 1.5 * delay; 

offset = dpm + iacshift - 0.25 * delay; 

maxeloc{ &loc, &maxener, residual, offset, length, 2); 

loc -= iacshift; 

print f ( "cshiftframe: firstloc %d loc - dpi); 

/* now find the first pitch pulse for sure */ 
if ( loc < dpm) i 

offset = loc + iacshift + 0.75 * delay + 0.5; 

length = 0.5 * delay; 

maxeloc( &loc, ^maxener, residual, offset, length, 2j ; 

loc -= iacshift; 

printfC Aloe %d", loc - dpi); 

} 

if( loc > dpm+delay) i 

offset = loc + iacshift - 1.25 * delay + 0.5; 
length = 0.5* delay; 

maxeloc ( iloc2, imaxener, residual, offset, length, 2); 

loc2 -= iacshift; 

if < loc2 >= dpm) loc = loc2; 

printfC Bloc %d-, loc - dpi); 

) 



15 



EP 0 602 826 A2 



18 14:56 1992 cshift frame .c Page 2 



*sfstart = dpm; 
*sfend = loc t extra; 
*inaxshift2 = maxshift; 

if( *sfend > dpi + subframel + extra) 
*sfend = dpi + subframel + extra; 

if ( loc >= dpi + subframel + extra/2) 
*sfend = dpi + subframel; 

if( loc >- *sfend II loc < *sf start) 
*maxshift2 = 0; 

printfC loc is: %d\n*Moc-dpl ) ; 
* debugging pictures */ 



I 



char titlelllOO); 

static float wl(2003, w2(200]j 

register int i; 



for7i=0; i< subframel+extra; i++) w2li) = res:.dual Idpl+iacshif t+i] . 

for( i-0; i<»sfstart-dpl; i++) "2(i) =0 0; 

for( i=*sf end-dpl ; Ksubf ramel+extra; w2 1i] - o.u. 
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output : 



length, ewl) 

output: location of maximum energy ♦/ 
energy at loc */ 
input: signal for which energy is to be found*/ 
input: data pointer into signal •/ 
input: window of data */ 
input: half length of energy window */ 



#include "macro. h" 

void maxeloc ( maxloc, maxener, signal, dp, 
int *maxloc; 
float *maxener; 
float *signal; 
int dp; 
int length; 
int ewl; 
I 

float ene re- 
register int i; 
int tail, front; 

ener =0.0; 

front - dp + ewl; 

tail = dp - ewl; 

for( i=tail; i<=front; i++) 

ener +- signal (i) * signal (i); 
♦maxloc = dp; 
*maxener - ener; 
for< i=l; i<length; i4+) { 
f ront++; 

ener +== signal If rontl * signal (front ] - signalftail] * signal Itail) ; 
tail++; 

if ( *maxener < ener) { 
*maxloc = i + dp; 
*maxener = ener; 

) 

I 



55 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



/* 
/* 
/* 
/* 
/* 
/- 
/* 



accshift, d_shift, dpm, shiftrange, 
residual, dpi, sfend) 
output: modified residual signal */ 
in/out: accumulated shift */ 
output: local shift value */ 

first nonvalid sample of residualm */ 
one side of shift range */ 
excitation waveform */ 
original residual signal */ 
window start */ 
window end */ 



output : 
input 
input 
input 
input 
input 



I include "macro.h" 
/* 

* inodifyorig - modify original 
•/ 

void modifyorigC residualm 
exctation, 
float * residua Im; 
float *accshift; 
float *d_shift; 
int *dpm; 
float shiftrange; 
float *exctation; 
float *residual; 
int dpi; 
int sfend; 
{ 

void bl_intrp() ; 
void getcritO; 
void testi__ubound() ; 
int k; 

float criterion, best; 
float shift; 
float optshift; 
float locmax; 

int left limit, rightlimit; 
int length; 
tdefine MAXDIM 100 

float crit (MAXDIM) ; 
float a, b; 
float sstep; 

length « sfend - dpi; 

/♦ first we upsample by a factor 2 */ 
sstep 0.5; 

rightlimit = shift range/sstep + 0.5; 

leftlimit = -rightlimit; ^ r ^ - <^ i 

if( leftlimit == rightlimit) rightlimit = leftlxmit - 1; 
printf ("modifyorig: Him %d rlim %d", leftlimxt, rxghtlxmit) , 
testi_ubound{ rightlimit*2+l , MAXDIM, "modif yorig .cl ); 
for (k-left limit; k<=rightlimit ; k++) I 

shift = ♦accshift + k * sstep; - , ^ . . 

getcrit( crit+k-left limit, residual+dpl, exctation+dpl, shift, length). 

) 

/* then we interpolate the criterion */ 

best = 0.0; 

optshift - *accshift; 

for<k=leftlimit + l; k<rightlijnit ; k++) { 

criterion = -2.0; 
if ( a != 0.0) { 

b = critlk-leftlimit + 1} - crit |k-lef tlimxt-1 ) ; 

locmax = - b / (2.0 ♦ a); 

if ( locmax <= 0.5 i& locmax >= -0.5) 



55 
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criterion = a * locmax * locmax + b * locmax + 2.0 • crit ( k-left limit) 

® if<criterion> best) I 

optshift = shift + sstep * locmax; 
best « criterion; 

) 

) 

*accshift = optshift; 

10 

printf<" optshift %5.2f best %.4e\n-, optshift, best); 
if ( best<«l .0) 

for (k=leftlimit+l; k<rightlimit ; k++) 

printf ("k=%d %f\n-, k, crit ik-left limit) ) ; 
forC k«0; k<length; k++) ( 
,5 bl_intrp< residualm+dpl+k, residual+dpl+k, -accshift, 0.9, 8); 

d^shift Idpl+k) = *accshift? 

) 

*dpm = dpl+length; 

) 

20 



25 



30 



35 
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10 



15 



I include ••macro.h" 
/* 

* bl_intrp - band-limited interpolation 
*/ 

void bl_intrp( output, input, delay, factor. 



fl) 



float *output; /' 

float * input; /* 

float delay; /* 

float factor; /* 

int fl; /* 
f 

/* NOTES 

* computes "input" signal value 

* at "delay" prior to the array pointer 



output: interpolated output value */ 
input : array to be interpolated */ 

delay where actual input is */ 
cut-off frequency (relative to fs*/ 
filter length is 2*fl+l */ 



xnput 
input 
input 



"input" into the "input" array. 



20 



register int n; 
register float t ; 
register float *f; 
register float argl, arg3; 
register float denorti; 
int offset; 



if < delay < 0) I 

offset * -delay + 0.5; 
25 offset = -offset; 

} 

else 

offset = delay + 0.5; 
t = offset - delay; 

f = input - offset; /* center sum around f */ 

30 

denom « 2.0 / (2.0 * fl + 1.0); 

*output = 0.0; 

for( n= -fl; n<=fl; n++) ( 

argl = PI * factor * (t-n) ; 

arg3 = PI * (t-n) ; 

if ( argl < l.e-2 &i argl > -l.e-2)/* just copy */ 

*output += factor * * (f +n) ; 
else /* sine function multiplied by hamming window */ 

*output += factor * (0.54 + 0.46 * cos ( arg3 * denom )) • 
*(f+n) * sin< argl) / argl; 



40 
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* test i_uboiind - test if argument a exceeds int boundary b and print text 
void testi_ubound( a, b, text) 

int a; /* input: value to be tested */ 

int b; /♦ input: boundary value */ 

char *text; /♦ input; program name */ 

{ 

if ( a > b) I 

printf ("\n%s-f-value exceeds range %d > %d\n", text, a, b) ; 
exit {10) ; 

I 

I 

/* 

* testi_bound - test if argument a exceeds range bl,b2 and print text 
*/ 

void testibound( a, hi, b2, text) 

int a; /* input: value to be tested */ 

int bl,b2; /* input: boundary values */ 

char *text; /* input: program name */ 

I 

if ( a < bl ) i 

printf ("\n%s-f-value exceeds range %d < %d\n**, text, a, bl); 
exit (10) ; 

1 

else if (a > b2 > {. 

printf ("\n%s-f -value exceeds range Id > %d\n", text, a, b2); 
exit(lO); 

) 

) 

/* 

* testf_bound - test if argument a exceeds range bl,b2 and print text 
V 

void testf_bound( z, hi, b2, text) 

float a; /* input: value to be tested */ 

float bl,b2; /* input: boundary values */ 

char *text; /* input: program name */ 

{ 

if ( a < bi ) I 

printf ("\n%s-f -value exceeds range %f < %f\n", text, a, bl); 
exit (10); 

) 

else if (a > b2 ) { 

printf (-\n%s-'f-value exceeds range %f > %f\n", text, a, b2) ; 
exit (10) ; 

) 

I 

/* 

* testd_bound - test if argument a exceeds range bl,b2 and print text 
*/ 

void testd_bound( a, bl, b2, text) 

double a; /* input: value to be tested */ 

double bl,b2; /* input: boundary values */ 

char *text7 /* input: program name */ 

( 

if ( a < bl ) I 
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printf ("\n%s-f-value exceeds range %f < %f\n", text, a, bl) 
exit (10) ; 

I 

else if (a > b2 ) i 

printf (-\n%s-f-value exceeds range %f > %f\n", text, a, b2) 
exit (10); 

) 



NOV 
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t include -macro.h" 

getcrit - confute error between excitation and shifted residual 

void getcrit( criterion, residual, exctation shift, length) 
^ • /* output: error criterion */ 

/* input : residual signal */ 



float *criterion; 
float * residuals- 
float *exctation; /* input 
float shift; /* i'^P"^ 
int length; /* i*^P«t 

void bl intrpO ; 
float output; 
register int i; 

♦criterion = 0.0; 

for( i==0; i<length; i++) I 

bl_intrp( fioutput, residual+i 
♦criterion += output * exctation [i] ; 

- ) 



reference signal */ 
shift */ 

vector length */ 



shift, 0.9, 8); 



Claims 

1. A method for coding an original signal, the method comprising the steps of: 

a. identifying one or more samples of the original signal based on a sample identification criterion; 

b. selecting a segment of the original signal to form a trial original signal, the segment including one 
or more of the Identified samples; 

c. for each of a plurality of trial original signals, evaluating a measure of similarity between the trial 
original signal and a synthesized signal; 

d. determining a trial original signal for use in coding based on one or more evaluated measures of 
similarity; and 

e. generating a signal reflecting a coded representation of the original signal, the signal generation 
based on one or more determined trial original signals. 

2. The method of claim 1 further comprising the steps of: 

1. analyzing one or more trial original signals to produce one or more parameters representative 
thereof; and 

2. synthesizing a signal which estimates the original signal, the synthesis based on one or more of 
the parameters. 
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3. The method of claim 1 wherein the step of identifying one or more samples of the original signal 
comprises analyzing the original signal to locate a local energy maximum. 

4. The method of claim 1 wherein the selected segment of the original signal comprises original signal 
5 samples other than the identified signal samples. 

5. The method of claim 4 wherein the selected segment comprises identified samples preceding one or 
more other original signal samples. 

10 6. The method of claim 1 wherein the step of selecting a segment comprises: 

1 . determining a time shift with reference to one or more samples of the original signal; and 

2. determining a set of original signal samples based on the time shift. 

7. The method of claim 1 wherein the step of evaluating a measure of similarity comprises forming a 
75 cross-correlation between the trial original signal and the synthesized signal. 

8. The method of claim 1 wherein the step of determining a trial original signal for use in coding 
comprises the step of selecting a trial original signal from among the plurality of trial original signals, 
the selection of the trial original signal based upon a comparison of evaluated measures of similarity. 

20 

9. The method of claim 1 wherein the step of determining a trial original signal for use in coding 
comprises the step of generating a trial original signal based on evaluated measures of similarity. 

10- The method of claim 9 wherein the step of generating a trial original signal comprises: 
25 1 . determining a substantially maximum measure of similarity from among a plurality of trial original 

signal similarity measures; and 

2. determining a time-shift reflecting the substantial maximum measure of similarity. 

11. The method of claim 10 wherein the step of generating a trial original signal further comprises 
30 determining sample values for the trial original signal based on a formed trial original signal and the 

time-shift. 

12. The method of claim 10 wherein the step of generating a trial original signal further comprises 
determining sample values for the trial original signal based on the original signal and the time-shift. 

35 

13. The method of claim 1 wherein the step of generating a signal reflecting a coded representation of the 
original signal comprises coding one or more determined trial original signals. 

14. The method of claim 13 wherein the step of coding one or more trial original signals comprises 
40 performing analysis-by-synthesis coding. 

15. The method of claim 14 wherein the step of performing analysis-by-synthesis coding comprises 
performing code-excited linear prediction coding. 

45 16. An apparatus for coding an original signal, the apparatus comprising: 

a. means for identifying one or more samples of the original signal based on a sample identification 
criterion; 

b. means for selecting a segment of the original signal to form a trial original signal, the segment 
Including one or more of the identified samples; 

50 c. means for evaluating a measure of similarity between each of a plurality of trial original signals 

and a synthesized signal; 

d. means for determining a trial original signal for use in coding based on one or more evaluated 
measures of similarity; and 

e. means for generating a signal reflecting a coded representation of the original signal, the signal 
55 generation based on one or more determined trial original signals. 

17. The apparatus of claim 16 further comprising: 
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1, means for analyzing one or more trial original signals to produce one or more parameters 
representative thereof; and 

2. means for synthesizing a signal which estimates the original signal, the synthesis based on one or 
more of the parameters. 

6 

18. The apparatus of claim 16 wherein the means for identifying one or more samples of the original signal 
comprises a means for analyzing the original signal to locate a local energy maximum. 

19. The apparatus of claim 16 wherein the means for selecting a segment comprises: 

TO 1. means for determining a time shift with reference to one or more samples of the original signal; 

and 

2. means for determining a set of original signal samples based on the time shift. 

20. The apparatus of claim 16 wherein the means for generating a signal reflecting a coded representation 
75 of the original signal comprises means for coding one or more determined trial original signals. 

21. The apparatus of claim 20 wherein the means for coding one or more trial original signals comprises 
means for performing analysis-by-synthesis coding. 

20 22. The apparatus of claim 21 wherein the means for performing analysis-by-synthesis coding comprises 
means for performing code-excited linear prediction coding. 
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